Goto

Collaborating Authors

 Web


Amazon's cloud 'hit by two outages caused by AI tools last year'

The Guardian

A technician works at an Amazon Web Services AI datacentre in New Carlisle, Indiana. A technician works at an Amazon Web Services AI datacentre in New Carlisle, Indiana. Amazon's cloud'hit by two outages caused by AI tools last year' Reported issues at Amazon Web Services raise questions about firm's use of artificial intelligence as it cuts staff Amazon's huge cloud computing arm reportedly experienced at least two outages caused by its own artificial intelligence tools, raising questions about the company's embrace of AI as it lays off human employees. A 13-hour interruption to Amazon Web Services' (AWS) operations in December was caused by an AI agent autonomously choosing to "delete and then recreate" a part of its environment, the Financial Times reported. AWS, which provides vital infrastructure for much of the internet, suffered several outages last year.




Make Microsoft's CEO cry by installing Chrome's 'Microslop' extension

PCWorld

PCWorld reports on a Chrome extension called "Microsoft to Microslop" that renames Microsoft references in browsers as a protest against the company's aggressive AI integration. The extension reflects widespread user frustration with Microsoft's Copilot AI, which faces extremely low adoption rates and growing privacy concerns among Windows users. Many users actively seek ways to remove AI features from Windows, highlighting significant backlash against Microsoft's AI strategy despite CEO dismissals of complaints. Microsoft CEO Satya Nadella says we shouldn't think of LLM output as "slop." You know, AI-generated content, the thing that's making the internet worse in every measurable way, and causing consumer electronics prices to skyrocket? So it would be a real shame if you installed an extension in your browser that changed "Microsoft" to "Microslop" all over the web. Yes, installing " Microsoft to Microslop " would be a naughty and entirely cynical response. Especially if you, say, used Edge's Chromium base to install it in Microsoft's own default web browser, Edge. That would just be twisting the AI-generated knife, wouldn't it?


Causal-driven attribution (CDA): Estimating channel influence without user-level data

Filippou, Georgios, Quach, Boi Mai, Lenghel, Diana, White, Arthur, Jha, Ashish Kumar

arXiv.org Machine Learning

Attribution modelling lies at the heart of marketing effectiveness, yet most existing approaches depend on user-level path data, which are increasingly inaccessible due to privacy regulations and platform restrictions. This paper introduces a Causal-Driven Attribution (CDA) framework that infers channel influence using only aggregated impression-level data, avoiding any reliance on user identifiers or click-path tracking. CDA integrates temporal causal discovery (using PCMCI) with causal effect estimation via a Structural Causal Model to recover directional channel relationships and quantify their contributions to conversions. Using large-scale synthetic data designed to replicate real marketing dynamics, we show that CDA achieves an average relative RMSE of 9.50% when given the true causal graph, and 24.23% when using the predicted graph, demonstrating strong accuracy under correct structure and meaningful signal recovery even under structural uncertainty. CDA captures cross-channel interdependencies while providing interpretable, privacy-preserving attribution insights, offering a scalable and future-proof alternative to traditional path-based models.


TheMCPCompany: Creating General-purpose Agents with Task-specific Tools

Esfandiarpoor, Reza, Suryanarayanan, Vishwas, Bach, Stephen H., Chowdhary, Vishal, Aue, Anthony

arXiv.org Artificial Intelligence

Since the introduction of the Model Context Protocol (MCP), the number of available tools for Large Language Models (LLMs) has increased significantly. These task-specific tool sets offer an alternative to general-purpose tools such as web browsers, while being easier to develop and maintain than GUIs. However, current general-purpose agents predominantly rely on web browsers for interacting with the environment. Here, we introduce TheMCPCompany, a benchmark for evaluating tool-calling agents on tasks that involve interacting with various real-world services. We use the REST APIs of these services to create MCP servers, which include over 18,000 tools. We also provide manually annotated ground-truth tools for each task. In our experiments, we use the ground truth tools to show the potential of tool-calling agents for both improving performance and reducing costs assuming perfect tool retrieval. Next, we explore agent performance using tool retrieval to study the real-world practicality of tool-based agents. While all models with tool retrieval perform similarly or better than browser-based agents, smaller models cannot take full advantage of the available tools through retrieval. On the other hand, GPT-5's performance with tool retrieval is very close to its performance with ground-truth tools. Overall, our work shows that the most advanced reasoning models are effective at discovering tools in simpler environments, but seriously struggle with navigating complex enterprise environments. TheMCPCompany reveals that navigating tens of thousands of tools and combining them in non-trivial ways to solve complex problems is still a challenging task for current models and requires both better reasoning and better retrieval models.


An Index-based Approach for Efficient and Effective Web Content Extraction

Chen, Yihan, Xu, Benfeng, Wang, Xiaorui, Mao, Zhendong

arXiv.org Artificial Intelligence

As web agents (e.g., Deep Research) routinely consume massive volumes of web pages to gather and analyze information, LLM context management -- under large token budgets and low signal density -- emerges as a foundational, high-importance, and technically challenging problem for agentic and RAG pipelines. Existing solutions for extracting relevant content are inadequate: generative extraction models suffer from high latency, rule-based heuristics lack adaptability, and chunk-and-rerank methods are blind to webpage structure. To overcome these issues, we introduce Index-based Web Content Extraction to reframe the extraction process from slow, token-by-token generation into a highly efficient, discriminative task of index prediction, achieving both effectiveness and efficiency. We partition HTML into structure-aware, addressable segments, and extract only the positional indices of content relevant to a given query. This method decouples extraction latency from content length, enabling rapid, query-relevant extraction. We first evaluate our method as a post-retrieval processing component within an RAG QA system and find that it improves QA accuracy. Then we directly measure its match rate with the target content in two scenarios: main content extraction (ME) and query-relevant extraction (QE). Experimental results show that our method outperforms existing works in both accuracy and speed, effectively bridging the gap between LLMs and the vast webpages.


WebMall -- A Multi-Shop Benchmark for Evaluating Web Agents [Technical Report]

Peeters, Ralph, Steiner, Aaron, Schwarz, Luca, Caspary, Julian Yuya, Bizer, Christian

arXiv.org Artificial Intelligence

LLM-based web agents have the potential to automate long-running web tasks, such as searching for products in multiple e-shops and subsequently ordering the cheapest products that meet the users needs. Benchmarks for evaluating web agents either require agents to perform tasks online using the live Web or offline using simulated environments, which allow for the exact reproduction of the experimental setup. While DeepShop provides an online benchmark that requires agents to perform challenging shopping tasks, existing offline benchmarks such as WebShop, WebArena, or Mind2Web cover only comparatively simple e-commerce tasks that need to be performed against a single shop containing product data from a single source. What is missing is an e-commerce benchmark that simulates multiple shops containing heterogeneous product data and requires agents to perform complex tasks. We fill this gap by introducing WebMall, the first offline multi-shop benchmark for evaluating web agents on challenging comparison shopping tasks. WebMall consists of four simulated shops populated with product data extracted from the Common Crawl. The WebMall tasks range from specific product searches and price comparisons to advanced queries for complementary or substitute products, as well as checkout processes. We validate WebMall using eight agents that differ in observation space, availability of short-term memory, and the employed LLM. The validation highlights the difficulty of the benchmark, with even the best-performing agents achieving task completion rates below 55% in the task categories cheapest product search and vague product search.


BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents

Zhang, Kaiyuan, Tenenholtz, Mark, Polley, Kyle, Ma, Jerry, Yarats, Denis, Li, Ninghui

arXiv.org Artificial Intelligence

The integration of artificial intelligence (AI) agents into web browsers introduces security challenges that go beyond traditional web application threat models. Prior work has identified prompt injection as a new attack vector for web agents, yet the resulting impact within real-world environments remains insufficiently understood. In this work, we examine the landscape of prompt injection attacks and synthesize a benchmark of attacks embedded in realistic HTML payloads. Our benchmark goes beyond prior work by emphasizing injections that can influence real-world actions rather than mere text outputs, and by presenting attack payloads with complexity and distractor frequency similar to what real-world agents encounter. We leverage this benchmark to conduct a comprehensive empirical evaluation of existing defenses, assessing their effectiveness across a suite of frontier AI models. We propose a multi-layered defense strategy comprising both architectural and model-based defenses to protect against evolving prompt injection attacks. Our work offers a blueprint for designing practical, secure web agents through a defense-in-depth approach.


Toward an AI-Native Internet: Rethinking the Web Architecture for Semantic Retrieval

Bilal, Muhammad, Qazi, Zafar, Canini, Marco

arXiv.org Artificial Intelligence

The rise of Generative AI Search is fundamentally transforming how users and intelligent systems interact with the Internet. LLMs increasingly act as intermediaries between humans and web information. Yet the web remains optimized for human browsing rather than AI-driven semantic retrieval, resulting in wasted network bandwidth, lower information quality, and unnecessary complexity for developers. We introduce the concept of an AI-Native Internet, a web architecture in which servers expose semantically relevant information chunks rather than full documents, supported by a Web-native semantic resolver that allows AI applications to discover relevant information sources before retrieving fine-grained chunks. Through motivational experiments, we quantify the inefficiencies of current HTML-based retrieval, and outline architectural directions and open challenges for evolving today's document-centric web into an AI-oriented substrate that better supports semantic access to web content.